25 research outputs found

    An improved Belief Propagation algorithm finds many Bethe states in the random field Ising model on random graphs

    Full text link
    We first present an empirical study of the Belief Propagation (BP) algorithm, when run on the random field Ising model defined on random regular graphs in the zero temperature limit. We introduce the notion of maximal solutions for the BP equations and we use them to fix a fraction of spins in their ground state configuration. At the phase transition point the fraction of unconstrained spins percolates and their number diverges with the system size. This in turn makes the associated optimization problem highly non trivial in the critical region. Using the bounds on the BP messages provided by the maximal solutions we design a new and very easy to implement BP scheme which is able to output a large number of stable fixed points. On one side this new algorithm is able to provide the minimum energy configuration with high probability in a competitive time. On the other side we found that the number of fixed points of the BP algorithm grows with the system size in the critical region. This unexpected feature poses new relevant questions on the physics of this class of models.Comment: 20 pages, 8 figure

    Deep learning via message passing algorithms based on belief propagation

    Get PDF
    Message-passing algorithms based on the belief propagation (BP) equations constitute a well-known distributed computational scheme. They yield exact marginals on tree-like graphical models and have also proven to be effective in many problems defined on loopy graphs, from inference to optimization, from signal processing to clustering. The BP-based schemes are fundamentally different from stochastic gradient descent (SGD), on which the current success of deep networks is based. In this paper, we present and adapt to mini-batch training on GPUs a family of BP-based message-passing algorithms with a reinforcement term that biases distributions towards locally entropic solutions. These algorithms are capable of training multi-layer neural networks with performance comparable to SGD heuristics in a diverse set of experiments on natural datasets including multi-class image classification and continual learning, while being capable of yielding improved performances on sparse networks. Furthermore, they allow to make approximate Bayesian predictions that have higher accuracy than point-wise ones

    Typical and atypical solutions in non-convex neural networks with discrete and continuous weights

    Full text link
    We study the binary and continuous negative-margin perceptrons as simple non-convex neural network models learning random rules and associations. We analyze the geometry of the landscape of solutions in both models and find important similarities and differences. Both models exhibit subdominant minimizers which are extremely flat and wide. These minimizers coexist with a background of dominant solutions which are composed by an exponential number of algorithmically inaccessible small clusters for the binary case (the frozen 1-RSB phase) or a hierarchical structure of clusters of different sizes for the spherical case (the full RSB phase). In both cases, when a certain threshold in constraint density is crossed, the local entropy of the wide flat minima becomes non-monotonic, indicating a break-up of the space of robust solutions into disconnected components. This has a strong impact on the behavior of algorithms in binary models, which cannot access the remaining isolated clusters. For the spherical case the behaviour is different, since even beyond the disappearance of the wide flat minima the remaining solutions are shown to always be surrounded by a large number of other solutions at any distance, up to capacity. Indeed, we exhibit numerical evidence that algorithms seem to find solutions up to the SAT/UNSAT transition, that we compute here using an 1RSB approximation. For both models, the generalization performance as a learning device is shown to be greatly improved by the existence of wide flat minimizers even when trained in the highly underconstrained regime of very negative margins.Comment: 34 pages, 13 figure

    The Hidden-Manifold Hopfield Model and a learning phase transition

    Full text link
    The Hopfield model has a long-standing tradition in statistical physics, being one of the few neural networks for which a theory is available. Extending the theory of Hopfield models for correlated data could help understand the success of deep neural networks, for instance describing how they extract features from data. Motivated by this, we propose and investigate a generalized Hopfield model that we name Hidden-Manifold Hopfield Model: we generate the couplings from P=αNP=\alpha N examples with the Hebb rule using a non-linear transformation of D=αDND=\alpha_D N random vectors that we call factors, with NN the number of neurons. Using the replica method, we obtain a phase diagram for the model that shows a phase transition where the factors hidden in the examples become attractors of the dynamics; this phase exists above a critical value of α\alpha and below a critical value of αD\alpha_D. We call this behaviour learning transition

    Storage and Learning phase transitions in the Random-Features Hopfield Model

    Get PDF
    The Hopfield model is a paradigmatic model of neural networks that has been analyzed for many decades in the statistical physics, neuroscience, and machine learning communities. Inspired by the manifold hypothesis in machine learning, we propose and investigate a generalization of the standard setting that we name Random-Features Hopfield Model. Here PP binary patterns of length NN are generated by applying to Gaussian vectors sampled in a latent space of dimension DD a random projection followed by a non-linearity. Using the replica method from statistical physics, we derive the phase diagram of the model in the limit P,N,DP,N,D\to\infty with fixed ratios α=P/N\alpha=P/N and αD=D/N\alpha_D=D/N. Besides the usual retrieval phase, where the patterns can be dynamically recovered from some initial corruption, we uncover a new phase where the features characterizing the projection can be recovered instead. We call this phenomena the learning phase transition, as the features are not explicitly given to the model but rather are inferred from the patterns in an unsupervised fashion

    Entropic gradient descent algorithms and wide flat minima

    Full text link
    The properties of flat minima in the empirical risk landscape of neural networks have been debated for some time. Increasing evidence suggests they possess better generalization capabilities with respect to sharp ones. First, we discuss Gaussian mixture classification models and show analytically that there exist Bayes optimal pointwise estimators which correspond to minimizers belonging to wide flat regions. These estimators can be found by applying maximum flatness algorithms either directly on the classifier (which is norm independent) or on the differentiable loss function used in learning. Next, we extend the analysis to the deep learning scenario by extensive numerical validations. Using two algorithms, Entropy-SGD and Replicated-SGD, that explicitly include in the optimization objective a non-local flatness measure known as local entropy, we consistently improve the generalization error for common architectures (e.g. ResNet, EfficientNet). An easy to compute flatness measure shows a clear correlation with test accuracy.Comment: updated version focusing on numerical experiment

    The star-shaped space of solutions of the spherical negative perceptron

    Full text link
    Empirical studies on the landscape of neural networks have shown that low-energy configurations are often found in complex connected structures, where zero-energy paths between pairs of distant solutions can be constructed. Here we consider the spherical negative perceptron, a prototypical non-convex neural network model framed as a continuous constraint satisfaction problem. We introduce a general analytical method for computing energy barriers in the simplex with vertex configurations sampled from the equilibrium. We find that in the over-parameterized regime the solution manifold displays simple connectivity properties. There exists a large geodesically convex component that is attractive for a wide range of optimization dynamics. Inside this region we identify a subset of atypical high-margin solutions that are geodesically connected with most other solutions, giving rise to a star-shaped geometry. We analytically characterize the organization of the connected space of solutions and show numerical evidence of a transition, at larger constraint densities, where the aforementioned simple geodesic connectivity breaks down.Comment: 27 pages, 16 figures, comments are welcom

    Consensus guidelines for the use and interpretation of angiogenesis assays

    Get PDF
    The formation of new blood vessels, or angiogenesis, is a complex process that plays important roles in growth and development, tissue and organ regeneration, as well as numerous pathological conditions. Angiogenesis undergoes multiple discrete steps that can be individually evaluated and quantified by a large number of bioassays. These independent assessments hold advantages but also have limitations. This article describes in vivo, ex vivo, and in vitro bioassays that are available for the evaluation of angiogenesis and highlights critical aspects that are relevant for their execution and proper interpretation. As such, this collaborative work is the first edition of consensus guidelines on angiogenesis bioassays to serve for current and future reference

    Unveiling the structure of wide flat minima in neural networks

    Full text link
    The success of deep learning has revealed the application potential of neural networks across the sciences and opened up fundamental theoretical problems. In particular, the fact that learning algorithms based on simple variants of gradient methods are able to find near-optimal minima of highly nonconvex loss functions is an unexpected feature of neural networks. Moreover, such algorithms are able to fit the data even in the presence of noise, and yet they have excellent predictive capabilities. Several empirical results have shown a reproducible correlation between the so-called flatness of the minima achieved by the algorithms and the generalization performance. At the same time, statistical physics results have shown that in nonconvex networks a multitude of narrow minima may coexist with a much smaller number of wide flat minima, which generalize well. Here we show that wide flat minima arise as complex extensive structures, from the coalescence of minima around "high-margin" (i.e., locally robust) configurations. Despite being exponentially rare compared to zero-margin ones, high-margin minima tend to concentrate in particular regions. These minima are in turn surrounded by other solutions of smaller and smaller margin, leading to dense regions of solutions over long distances. Our analysis also provides an alternative analytical method for estimating when flat minima appear and when algorithms begin to find solutions, as the number of model parameters varies.Comment: 15 pages, 8 figure
    corecore